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Abstract 

It is well known that an (n, k) code can be used to store information in a distributed storage system with n 
nodes/disks. If the storage capacity of each node/disk is normalized to one unit, the code can be used to store k units 
of information, where n > k. If the code used is maximum distance separable (MDS), then the storage system can 
tolerate up to (n~k) disk failures (erasures), since the original information can be reconstructed from any k surviving 
disks. The focus of this paper is the design of a systematic MDS code with the additional property that a single 
disk failure can be repaired with minimum repair bandwidth, i.e., with the minimum possible amount of data to be 
downloaded for recovery of the failed disk. Previously, a lower bound of -^5^ units has been established by Dimakis 
et. al, on the repair bandwidth for a single disk failure in an (n, k) MDS code based storage system, where each of 
the n disks store 1 unit of data. Recently, the existence of asymptotic codes achieving this lower bound for arbitrary 
(n, k) has been established by drawing connections to an asymptotic interference alignment scheme developed by 
Cadambe and lafar for the interference channel. While the recent asymptotic constructions show the existence of 
codes achieving this lower bound in the limit of large code sizes, finite code constructions achieving this lower bound 
existed in previous literature only for the special (high-redundancy) scenario where k < max(n/2, 3). The question 
of existence of finite codes for arbitrary values of (n, k) achieving the lower bound on the repair bandwidth remained 
open. As a main contribution of this paper, we provide the first known construction of a finite code for arbitrary 
(n, k), which can repair a single failed systematic disk by downloading exactly units of data. The codes, which 
are optimally efficient in terms repair bandwidth are based on permutation matrices^]. We also show that our code has 
a simple repair property which enables efficiency, not only in terms of the amount of repair bandwidth, but also in 
terms of the amount of data accessed on the disk. We also generalize our permutation matrix based constructions by 
developing a novel framework for repair-bandwidth-optimal MDS codes based on the idea of subspace interference 
alignment - a concept previously introduced by Suh and Tse the context of wireless cellular networks. 



This paper will be published, in part, in the Proceedings of IEEE Symposium on Information Theory (ISIT) 2011 [T]. 
1 The permutation marix based constructions of this paper have been discovered in parallel by Tamo et. al in (2J 



I. Introduction 

Consider a distributed storage system with n distributed data disks, with each disk storing one unit of data. 
Assume that the amount of information to be stored in this storage system is equal to k units, where k < n, with 
the extra storage space of n — k units used to build redundancy in the system. Then, it is well known that the 
optimal tolerance to failures, for a fixed amount of storage, can be provided by using a (n, k) maximum distance 
separable (MDS) erasure code to store the data. Such a code would tolerate any (n ~ k) disk failures (erasures), 
since the MDS property ensures that the original information can be recovered by using any k surviving disks. 
When disk failure occurs, efficient (fast) recovery of the failed disk/s is important, since replacing the failed disk/s 
before other disks fail reduces the chance of data loss and improves the overall reliability of the system. While 
an MDS code based storage system can tolerate a worst-case failure scenario of k disks, the most common failure 
scenario in a storage system is the case where a single disk fails. A problem that has received considerable attention 
in recent literature [[3l- lfP3l , and is the focus of this paper, is the recovery efficiency (speed) of a single disk failure 
in an MDS code for distributed storage systems. The increased interest in efficient repair for erasure codes stems, in 
part, from connections of the problem to various important topics in information and coding theory. First, naturally, 
the problem is related to the classical field of erasure coding for storage. Second, as demonstrated in Q, the 
problem of efficient repair connected to network coding. In particular, it is connected to the multi-source network 
coding problem with generalized demands (i.e., non-multicast) - a classical open problem in network information 
theory. Finally, as demonstrated in references [4]-[6], it is connected to the interference management strategy of 
interference alignment - technique widely studied in the context of wireless communications. This final connection 
will be especially explored in detail in this paper. 

When a single node fails in a storage system, a new node enters the storage system, connects to the surviving 
d = n — 1 disks via a network, downloads data from these (n — 1) surviving disks, and reconstructs the (data stored 
in the) failed nodfl The primary factor in determining the speed of recovery of the failed node is the amount of 
time taken for the new node to download the necessary data from surviving disks, which, in turn, depends on the 
amount of data accessed and downloaded by the new node. This problem has been studied from the perspective 
of the amount of data to be downloaded - also known as the repair bandwidth - by the new node for successful 
recovery of the failed node in iQI- lfTTI . Note that a trivial repair strategy for any (n, k) MDS code is to achieve a 
repair bandwidth of k units for a single failed disk. This is because the entire original data, and hence the failed 
disk, can be recovered with the new node reading any set of k surviving disks completely. A natural question of 
interest is the following: what is the minimum repair bandwidth required for a single failed node in an MDS code 
based distributed storage system? A cut-set lower bound for this question, i.e., for the minimum repair bandwidth, 
was derived to be ^--^ < k units in reference [3 j. The question of whether this lower bound is achievable via 
code constructions has received considerable attention in recent literature Bl- lflOl . In particular, recent literature 
has made progress on this problem of minimum repair bandwidth for repair by drawing connections to the wireless 
interference management technique of interference alignment. Related results in current literature related to this 
problem is summarized below. 

1) Finite Codes for Low Rates: By connecting the problem of exact repair to the wireless interference management 
technique of interference alignment, codes which achieve the repair-bandwidth lower bound of \,Zk nave been 
found in IfTTI for the case where k < max(n/2,3). In other words, if the rate, k/n, of the code is 
smaller or equal to than half, finite explicit MDS code constructions exist which can repair a failed node with 
a repair bandwidth of ^£ units. The repair bandwidth was achieved with the new node downloading —^r 
units of each of the n — 1 surviving nodes. 

2) Asymptotic Codes for Arbitrary (n, k): For arbitrary (n, k) references [7 |, [8 | used the asymptotic interference 
alignment scheme constructed in reference [15], in the context of wireless interference channels, to generate 
codes which achieve the optimal repair bandwidth of ^Eif asymptotically as the size of the code becomes 
arbitrarily large. 

While the above results are interesting from a theoretical perspective, a matter of relevance for several storage 
systems in practice are efficient repair strategies for high rate codes, i.e., for storage systems that have a small 

2 In this paper, we restrict ourselves to exact repair, where the new node has to be a replica of the failed node. Note that this is unlike (3], 
1141 which consider functional repair where the new node only has to be information equivalent to the failed node 

3 There is a subtle difference between the amount of data accessed and downloaded; Such differences are explored later in Section Hl-AI 



number of parity nodes as compared to the number of systematic codes and therefore operate in the regime where 
k/n > 1/2. While this asymptotic constructions provide an interesting theoretical limit to what practical codes 
can achieve, the existence of finite codes achieving a repair bandwidth of -^5^ units remained an open problem of 
practical interest. In fact, for arbitrary (n, k), the construction of finite codes having a repair strategy more efficient 
than the trivial repair strategy with a repair bandwidth of k units remained open. It is this open problem that is the 
main focus of this paper. We shall next take a closer look at this open problem from the perspective of literature 
and techniques associated with interference alignment. 

A. Connections of Repair Bandwidth to Interference Alignment 

In the context of linear codes (which suffices for this paper), the connections between exact repair and interference 
in wireless systems can be understood as follows. Consider an (n, k) systematic code, where the first k nodes are 
systematic and hence store k (uncoded) independent sources, each of size one unit. The remaining n—k nodes are 
parity nodes. Each parity node stores a linear combination of the k sources, where the combinations are defined by 
the code generator matrix. Now, suppose that a node, say the first node, fails. In order to repair this node, we assume 
that the new node downloads a certain set of linear combinations from each of the n — 1 surviving nodes. The 
goal is to recover the first source from this set of linear combinations. The k — 1 surviving systematic nodes store 
information that is independent of the first source. The information of this first source is stored in the n — k parity 
nodes - but this desired information in the parity nodes is "mixed" with the remaining (k — 1) sources corresponding 
to the remaining k — 1 systematic nodes. These k — 1 sources which are not required by the new node, but arrive 
in the linear combinations downloaded from the parity nodes because they are "mixed" with the first source are 
analogous to interference in wireless communication systems. The coding matrices, which define how the sources 
are mixed into parity nodes, are analogous to channel matrices in wireless communications which also perform the 
same function. The linear combinations downloaded by the new node to repair the failed node are analogous to 
the beamforming vectors in wireless communications (See ||6), (H for instance). In both applications, the greater 
the extent of alignment, the more efficient is the system. In the wireless context, interference alignment reduces 
the footprint of the interference at a receiver and frees a greater number of dimensions for the desired signal (and 
typically leads to improved number of degrees of freedom [ 15]). In the repair context, interference alignment reduces 
the footprint of the interfering sources at the new node, and hence means that a smaller number of units need to 
be downloaded to cancel this interference. However, one important difference exists - in the wireless context, the 
channel matrices are given by nature and cannot be controlled, whereas, in the storage context, the coding matrices 
are a design choice. 

The approach of references 0, Q in asymptotic code construction essentially stemmed from mimicking the 
wireless interference channel matrices in code construction. These references used diagonal coding sub-matrices 
analogous to those obtained using symbol extensions and vector coding in wireless channels without inter-symbol- 
interference. The surprising insight of these references is that, even though there is additional freedom in the storage 
context as compared to the wireless context because the coding matrices can be designed, the cut-set lower bound 
can be achieved asymptotically by mimicking the wireless channel matrices for coding in the storage context. In 
other words, there is no loss from the perspective of the extent of alignment, in an asymptotic sense, when the 
wireless channel matrices are used for coding in the storage context. Because the coding matrices are analogous 
to the channel matrices in wireless context, the size of the code is similar to the size of the channel matrices (or 
the symbol extensions used). In the wireless context of naturally occurring channel matrices, asymptotically large 
channel matrices (and more generally, asymptotically large amount of diversity) is necessary in general to achieve 
the maximum extent of alignment, at least, with linear schemes ifTBI . fl6l . However, the existence of finite codes 
for storage is related to the following question: if we have the freedom to design these coding (channel) matrices, 
can we achieve the desired extent of alignment with finite-size matrices, or are asymptotic schemes unavoidable 
much like the wireless context? It is worth noting that literature in interference alignment contains examples of 
wireless channels with certain special channel matrices, where, interference alignment is indeed achieved with 
finite-size channel matrices 031 . ifTTl . fl8l . Of relevance to this work is reference [18] which shows that if the 
channel matrices have a specific tensor (Kronecker) product structure, then alignment is possible with finite-size 
channel matrices using the notion of subspace interference alignment. While these examples serve the purposes of 
simplifying the concept of alignment for exposition, their practical applicability in the wireless context is limited, 
because of the nature of the wireless channel. In the storage context, however, the coding (channel) matrices are 



a design choice; in this paper, we exploit this flexibility and the insights of interference alignment literature (and 
reference [18|, in particular) to develop finite-size code constructions for distributed storage. 

Before we proceed, we note that there exists, in literature, a parallel line of work, which studies the repair 
bandwidth for codes which are not necessarily MDS and hence use a greater amount of storage for a given amount 
of redundancy Q, 0, lfl9l . Il20l . These references study the trade-off between the amount of storage and the 
repair bandwidth required, for a given amount of redundancy. Further, we also note that design of codes, from the 
perspective of efficient recovery of its information elements for error-correcting (rather than erasure) erasure has 
also been studied in literature in associated with locally decodable codes (See [21] and references therein). The 
focus of this paper, however, is on MDS erasure codes (also referred to as minimum storage regenerating codes), 
i.e., (n, k) codes which can tolerate any (n — k) erasures. 

II. Summary of Contributions 

The main contribution of this paper is the design a new class of MDS codes which achieve the minimum repair 
bandwidth of units for the repair of a single failed systematic node. Our constructions operate with the new 
node downloading — K- units from each of the n — 1 surviving nodes for repair. The code constructions presented 
in this paper are listed below. 

1) Permutation Matrix Based Code^for General {n, k): In Section[VJ we present a construction of codes which 
achieve the repair bandwidth lower bound of ^=1 units for repair of systematic nodes for any tuple (n, k) 
where n > k. The code generator submatrices of the construction are based on permutation matrices. The 
code construction, albeit finite, is based on random coding, with the random coding argument used to justify 
the existence of a repair-bandwidth optimal MDS code. This means that for any arbitrary (n, k), a brute-force 
search over a (finite) set of codes described in the section, will yield a repair-bandwidth optimal MDS code. 

2) Explicit Construction for n — k e {2, 3}: While Section M describes a random coding based construction, we 
also provide in Section IVTl explicit constructions for the special case of n — k £ {2, 3}. 

3) Subspace Interference Alignment Framework for Optimal Repair: In Section IVIII we connect the idea of 
interference alignment via tensor (Kronecker) products, originally introduced in |[T8l . to the Permutation matrix 
based codes developed in Section [V] The tensor-product based alignment framework, also termed subspace 
alignment in 1181 . provides a generalization of the Permutation matrix based codes developed in Section [V] 
and leads to a development of a family of MDS codes with optimal repair bandwidth. 

It must be noted that the search for codes with efficiently repair both systematic and parity nodes is still open. 
However, from a practical perspective, the step taken in this paper is important since, in most storage systems, the 
number of parity nodes is small compared to systematic nodes. 

A. Efficient Code Construction in terms of Disk Access 

While most previous works described above explore the repair problem by accounting for the amount of 
information to be sent over the network for repair, there exists another important cost during the repair of a 
node viz. amount of disk access. To understand the difference between these two costs, consider a toy example 
of a case where a disk stores two bits 0,1,0,2- Now, suppose that, to repair some other failed node in this system, 
the bit ai + 02 has to be sent to a new node. This means that the bandwidth required for this particular disk is 1 
bit. However, in many storage systems, the disk-read speed is slower than the network transfer speeds and hence 
becomes a bottleneck. In the case where the disk read speed is a bottleneck, the defining factor in the speed of 
repair is the amount of disk access rather than the repair bandwidth. In the toy example described the amount of 
disk access is 2 bits as both ai and 02 have to be read from the disk to compute a± + a^- Thus, it is possible 
that certain codes, while minimizing repair bandwidth, can perform poorly in terms of disk access rendering the 
codes impractical. In this paper, we will formalize this notion of disk access cost, and show that the codes based 
on permutation matrices in Section [V] are not only bandwidth optimal, but also disk-access optimal, for the repair 
of a single failed systematic node. 



4 The authors of reference 0, 1221 have discovered this class of permutation-matrix based codes in parallel work. 



III. A Linear Algebraic Problem 



We begin by describing a linear algebraic problem which lies at the core of repair-optimal MDS codes. In 
particular, the problem described here is the problem we solve to find the optimal repair of (n = k + 2, k) codes. 
We start with a simpler problem which lies at the core of the special case where (n = 4, k = 2) repair-optimal 
code and later generalize the problem. 



Problem 1: A Simple Feasibility Problem 
Consider the following set of equations. 
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where Hi and H 2 be L x L matrices over some finite field. Now, the question of interest is, are the above set 
of equations feasible? In other words, can we choose matrices Hj,V, so that the above equations are satisfied. 
We assume that the field size and the size of H;, i.e., L are parameters of choice. Because of (|5), we can assume 
without loss of generality that Vj,t = 1,2 are L/2 x L matrices. 

Now, (Q]), (O imply that the space spanned by the rows of Vj is an invariant subspace of H,-, for i = 1,2, j 6 
{1, 2} — {i}. Further, d2},(|4) imply that none of the row vectors Vj lie in the span of VjHj for i = 1, 20 Before 
solving this problem, it is worth noting that V has to have at least L/2 linearly independent row vectors - or 
equivalently, a rank of at least L/2 - in order to satisfy ©,(14). Further, also note that, if we had allowed Vj, i = 1, 2 
to each have a rank as large as L rather than L/2 in equation (|5), the solution could have been trivial since any full 
rank matrices V,,Hj,i = 1,2 would used to satisfy the conditions (Q])-©. The question posed here, however, is 
whether there exist matrices V, having exactly L/2 linearly independent row vectors, satisfying the above equations. 
It turns out that this problem has a fairly simple solution with L = 2 and field size q — 5. To see this, note that 
with L = 2, ([TJ and (0) can be interpreted as eigen vector equations. Therefore, we can choose Vf to be an eigen 
vector of and to be an eigen vector of Hj . As long as and can be chosen so that they have 
distinct (non-collinear) sets of eigen-vectors, the equations (0 and (0]l are satisfied. It can be verified that in a field 
of size 5, Hi and H 2 can be chosen so that this property is satisfied. In fact, in a sufficiently large field size, the 
entries of Hj,i = 1,2 can be randomly chosen independently, and uniformly over the entries of the field. With 
such a choice, it can be shown that, if Vf is chosen to be the eigen-vector of Hj, j ^ i the equations (UJ)-© 
are satisfied with a non-zero probability, thus guaranteeing feasibility. The solution to this problem automatically 
implies that for n = 4, k — 2, a single failed systematic node can be repaired by downloading exactly half the data 
stored in every surviving node (see Fig. [TJ. 

Problem 2: Increase the number of constraints in Problem 1 

Now, let us generalize Problem 1 . The goal of this generalized version is to verify the feasibility of the following 
equations, where Hi, i = 1,2, ... ,N are L x L matrices and Vi, i = 1,2, ... ,N are L/2 x L matrices. 

span^H,) = span(Vi),je{l,2,...,AT}-{i} (6) 



rank H,V 
rank(Vi) =rank(Hi)/2 



= L,i=l,2,.. 
= £72,1 = 1,2, 
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5 For the reader familiar with interference alignment literature in wireless communications, equations {T},((3) are similar to the conditions 
that all the interference align along Vj, where Hj, j i is analogous to the channel matrix corresponding to an interfering link (See 1151 
for example). Similarly, conditions (|2j and (|4) are analogous to the condition that the desired signal appearing along matrix is linearly 
resolvable from the aligned interference Vj. The key difference between this problem and from most of interference alignment literature in 
wireless communications, is that, here, unlike in the latter, matrices Hj are design choices. 
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Fig. 1. Repair of the first failed node in a (4,2) MDS code-based system along the lines of |4|. Note that the repair is possible because {T} 
enables cancellation of Via2 and enables reconstruction of ai, Similarly equations J3},@J enable repair when the second node fails, where 
V2 is used for obtaining linear combinations from the surviving nodes. 



where L is a parameter of choice. Here, it is worth noting two things. First, as before, if we had intended to find 
L x L matrices Vj satisfying ©.Q, the problem would have been trivial. Also, V, can have no smaller than L/2 
rows because of (0. The question here, as before, is to construct V, each of which have exactly L/2 row vectors 
satisfying the above conditions. The second point worth noting is that Problem 2 is more challenging than Problem 
1 because the constraints here are more strict than the constraints of Problem 1. Problem 1 is, in fact, a special 
case of the above problem when N = 2. However, as N increases, the number of constraints increases. This poses 
some additional constraints on the choice of matrices as compared to Problem 1. For instance, we will need Hi 
and H2 to have N — 2 distinct common invariant subspaces V m ,m 7^ i,m 5^ j, in addition, to the condition that 
Vi (resp. V2) is invariant w.r.t. H2 (resp. Hi) but linearly independent of V1H1 (resp. H2). Therefore, it is not 
clear at first sight whether the issue of feasibility can be resolved for arbitrary N. 

References J6), Q show that the above constraints can be satisfied asymptotically, as L — > 00, by using random 
diagonal matrices for Hj and the asymptotic interference alignment solution of |fT31 to construct Vj for i — 
1, 2, . . . , N. However, it was not known whether the above set of constraints is feasible when L is restricted to be 
finite - it is this open problem that is solved in this paper. In particular, we will use a tensor-product based framework 
which enables us to decompose this problem into several instances of Problem 1 and hence show feasibility. Put 
differently, the framework will enable us to stitch multiple instances of problem 1 using the idea of tensor products 
to solve the above problem. 

The rest of this paper is organized as follows. We will first present a framework used for our repair optimal code 
in the next section. In this next section, we will also connect the repair problem to the problem presented above. 
In Sections [V] and [VH we will respectively present random codes and explicit codes based on permutation matrices 
which are optimal from the perspective of repair of a single systematic node. These constructions can be interpreted 
as a solution to the above problem where H^ are permutation matrices. In Section IVTfl we will revisit the problem 
described above, and present our tensor-product based framework to solve this problem. The framework of Section 



IVHI generalizes the permutation matrix based construction of [VI 



IV. System Model - Optimal Repair for an (n, k) MDS Code 

In this section we present a general framework for optimal repair a single failed node in a linear MDS code 
based distributed storage system. Consider k sources, all of equal size C = M./k over a field ¥ q of size q. Source 
i G {1, 2, . . . , k} is represented by the C x 1 vector G . Note here that M. denotes the size of the total 
information stored in the distributed storage system, in terms of the number of elements over the field. There are 
n nodes storing the k source (vector) symbols using an (n, k) MDS code. Each node stores a data of size C, i.e., 
each coded (vector) symbol of the (n, k) code is a C x 1 vector. Therefore, 1 unit is equivalent to C scalars over 
the field q. The data stored in node i represented by C x 1 vector d^, where i = 1, 2, . . . , n. We assume that our 
code is linear and can be represented as 

fc 

dj = ^ij' a j' 

i=i 

are £ x £ square matrices. Further, we restrict our codes to have a systematic structure, so that, for 

,k} 

I J = i 
j^i 

Since we restrict our attention to MDS codes, we will need the matrices Cf j to satisfy the following property 
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for any distinct j x ,j 2 , ■ ■ ■ , jk G {1, 2, . . . ,n}. 

The MDS property ensures that the storage system can tolerate up to (n — k) failures (erasures), since all the 
sources can be reconstructed from any k nodes whose indices are represented by jfi, j2, • ■ • , jk £ {1, 2, . . . , n). Now, 
consider the case where a single systematic node, say node i G {1, 2, . . . , k} fails. The goal here is to reconstruct 
the failed node i, i.e., to reconstruct d^, using all the other n — 1 nodes, i.e., {d^ : j ^ i}. To understand the 



solution, first, consider the case where node 1 fails. We download a fraction of 



of the data stored in each of the 



nodes {1, 2, 3, . . . , n} — {1}, so that the total repair bandwidth is units. We focus on linear repair solutions for 
our codes, which implies that we need to download -^r linear combinations from each of d 3 , j G {2, 3, ... , n}. 



Specifically, we denote the linear combination downloaded from node j G {2, 3, 



..,n 
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where Vij is a x C dimensional matrix. The matrices Vij are referred to as repair matrices in this paper. 
The goal of the problem is to construct C components of ai from the above equations. For systematic node 
j G {2, 3, . . . , k}, the equations downloaded by the new node do not contain information of the desired signal ai, 
since for these nodes, Cj i = 0. The linear combinations downloaded from the remaining nodes j G {k + 1, k + 
2, . . . , n}, however, contain components of both the desired signal and the interference. Thus, the downloaded linear 
combinations Vijdj are of two types. 
1) The data downloaded from the surviving systematic nodes i = 2, . . . , k contain no information of the desired 
signal ai, i.e., 
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Fig. 2. Repair of first node for n = 5, k = 3. Equation UOt ensures that interference cancellation is possible 



Note that there — £r; such linear combinations of each interfering component a 3 , j = 2, 3, . . . , k. 
2) Now, from each of the n — k parity nodes, linear combinations are downloaded. Therefore, a total of C 
linear combinations are downloaded from parity nodes. The C components of the desired signal have to be 
reconstructed using these C linear combinations of the form Vijdj, j = k+1, k+2, . . . , n. Note here that these 
are C linear equations in kC scalars - the C desired components of a 4 and (k — 1)C interfering components 
of SL2, &3, ■ ■ ■ , a*,. For successful reconstruction of the desired signal, the interference terms associated with 
a.j , j — 2 . . . , k contained in these linear combinations need to be cancelled completely. 
The goal of our solution will be to completely cancel the interference from the second set of C linear combinations, 
using the first set of linear combinations. Then a x is regenerated using this second set of C interference-free linear 
combinations (See Fig. 13. 

A. Interference Cancellation 

The linear combinations corresponding to interference component a; , i ^ 1 downloaded using node i by the new 
node is Vi ^aj for i = 2, 3, . . . , k. To cancel the associated interference from all the remaining nodes V 4 jdj by 
linear techniques, we will need, Vj = k + 1, k + 2, . . . , n, Vi = 2, 3, . . . , k 

rowspan(Vij-Cj,i) C rowspanCVi^), 
=4> rowspanfVijCjjj) = rowspan(Vi,j), (10) 

where ( [Tol l follows because C,-^ are all full rank matrices and therefore, the subset relation automatically implies 
the equality relation as rankCVijCj,,) = rank(Vi j) = = rank(Vi i). Thus, as long as ( TTOb is satisfied for all 
values j e {k + 1, k + 2, . . . , n}, i S {2,3,..., k}, the interference components can be completely cancelled from 
Vijdj to obtain Vi^-C^iai, j G {k + 1, k + 2, ... ,n} (See Fig. 13. Now, we need to ensure that the desired Cxi 
vector ai can be uniquely resolved from the C linear combinations of the form VijC^iai, j — k + 1, k + 2, ... ,n. 
In other words, we need to ensure that 
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Vi jra C ni i 

If we construct Cij and Vi^ satisfying (TTOb and dTTb for i = 2, . . . , n, j = 1,2, ... ,k, I = 1,2,3 ... ,n, then, a 
failure of node 1 can be repaired with the desired minimum repair bandwidth. To solve the problem for the failure 
of any other systematic node, we need to ensure similar conditions. We summarize all the conditions required for 
successful reconstruction of a single failed (systematic) node with the minimum repair bandwidth below. 
• Equation in Property Q] 



• The interference alignment relations. 



rowspan(V; J C :(i j) = rowspan(V; i j) 

for I = 1, 2, . . . , k, j = k + 1, k + 2, . . . ,n and i G {1, 2, . . . , k} - {1} 
Reconstruction of the failed node, given that the alignment relations are satisfied. 



(12) 



rank 



Vjjfc+iCfc+i,! 
Vi ) fc+2Cfe + 2,/ 



= c 



(13) 



■,k, j 



1, k + 2, . . . , n and 



for 1 = 1,2, 

Note that given n, k, our design choices are C, q, Cjj and Vy for I = 1, 2, 

ie{i,2,...,k}-{l}. 

Reference ||5] has shown that the above conditions cannot be satisfied if we restrict ourselves to M. = k(n — k). 
References Q, E) constructed solutions which satisfied the above relations in an asymptotically exact, as M — > oo. 
The main contribution of this paper is the construction of coding sub-matrices and repair matrices so that the above 
relations are satisfied exactly, with finite M., i.e., with hA = k{n — k) k . 

B. Connections to Problem 2 in Section [777| 

Above, we have defined a general structure for a linear, repair bandwidth optimal solution. In the specific solution 
described in this paper, the repair matrices satisfy an additional property: in our solution, 



V 



for all I G {1,2,..., k},j ^ j G {1, 2, . . . , n} — {I}. In other words, when a node, say node I, fails, we 

download the same linear combination from every surviving node. We use the notation 

for all j G {1, 2, . . . , n} — {I}. Further, in our solution the coding sub-matrices associated with the first parity node 
are all (scaled) identity matrices, i.e., Ck+i,i = \k+i,iT-c f° r i = 1, 2, . . . , fc where \k+i,i is a scalar over the field 
Fq, so that 

k 



i=i 



Now, with these choices, it can noted for n — k = 
previous section, where k = N, £ — L and Ck+2,i 
at the core of the repair problem. 



2, equations ( TT2l and ( fT~3b are equivalent to © and ((T) in the 
= Hi . Thus, the problems motivated in the previous section lie 



C. Disk-Access Optimality 

Our solution satisfies a disk-access optimality property which is defined formally here. 

Definition 1: Consider a set of CxC dimensional coding sub-matrices Cy, i = k+1, k+2, . . . ,n,j = 1,2, ... ,k 
and a set of repair matrices Vj^ for some I G {1,2, ... ,n} and for all i G {1, 2, . . . , n} — {I}, where the repair 
matrix Vj j has dimension B^i x C, where Bi i < C. The repair matrices satisfy the property that d/ can be 
reconstructed linearly from V^jdj, i G {1, 2, . . . , n} — {I}. In other words, a failure of node I can be repaired using 
the repair matrices. Then the amount of disk access required for the repair of node I is defined to be the quantity 



E 

i={l,2,...,n}— {1} 



w(Vi,i) 



where a; (A) represents the number of non-zero columns of matrix A. 

To compute V/^dj, only w(V/ i) entries of the matrix have to be accessed. This leads to the above definition 
for the amount of disk access for a linear solution. Also, note that if rank(V;.i) - the amount of bandwidth used 
- is always smaller than u(Vi i). Therefore, the amount of disk access is smaller than the amount of bandwidth 
used for a given solution. This leads to the following lemma. 



a = 



Lemma 1: For any (n, k) MDS code storing 1 unit of data in each disk, the amount of disk access needed to 
repair any single failed node I = 1, 2, . . . , n is at least as large as units. 

Our code constructions based on permutation matrices presented in the next section are not only repair bandwidth 
optimal, but it are also optimal in terms of disk access since they meets the bound of the above lemma. More formally, 
for our solution Vj not only has a rank of C/(n — k), it also has exactly C/(n — k) non-zero columns; in fact, 
Vj has exactly C/ (n — k) non-zero entries. Among the C columns of Vj, C — columns are zero. This means 
that, to obtain the linear combination V;d; from node i for repair of node I =^ i, only -^-^ entries of the node i 
has to be accessed. We now proceed to describe our solution. 

V. Optimal Codes via Permutation Matrices 

In this section, we describe a set of random codes based on permutation matrices satisfying the desired properties 
described in the previous section. We begin with some preliminary notations required for our description. 

Notations and Preliminary Definitions: The bold font is used for vectors and matrices and the regular font is 
reserved for scalars. Given a / x 1 dimensional vector a its I components are denoted by 

a(l) 
a(2) 

o(l) 

For example, di = [<ii(l) d\{2) . . . di(C)] T . Given a set A, the Z-dimensional Cartesian product of the set is 
denoted by A 1 . The notation I; denotes the I x I identity matrix; the subscript I is dropped when the size I is clear 
from the context. Next, we define a set of functions which will be useful in the description of our codes. 

Given (n,k) and a number m £ {1,2, ...,(n— k) k }, we define a function^ $ : {1, 2, . . . , (n - k) k ] -)• 
{0, 1, . . . , (n — k — l)} k such that cf>(m) is the unique k dimensional vector whose k components represent the 
fe-length representation of m — 1 in base (n — k). In other words 

k 

(f>(m) = (n,r 2 , ...,r k )^m-l = }^n(n - k) 1 ' 1 , 

i=l 

where S {0, 1, . . . , (n — k — 1)}. Further, we denote the ith component of tp(m) by 4>i(m), for i = 1,2, ... ,k. 
Since the /c-length representation of a number in base (n — k) is unique, <f> and <\>i are well defined functions. 
Further, <fi is invertible and its inverse is denoted by <j> . We also use the following compressed notation for ^> _1 . 

fe 

(ri,r 2 , ...,r k ) = (/)^ 1 (r 1 ,r 2 , ...,r k ) = ^n(n - k) 1 ^ 1 - 1 

i=l 

The definition of the above functions will be useful in constructing our codes. 
A. Example : n=5, k=3 

We motivate our code by first considering the case where k = 3, n = 5 for simplicity. The extension of the 
code to arbitrary n, k will follow lateiQ For n = 5, k = 3, we have M./k = [n — k) k = 2 3 = 8. As the name 
suggests, we use scaled permutation matrices for Ci.j,j £ {1,2,..., k}, i £ {k + 1, k + 2, . . . , n}. Note here that 
the variables a.j, j — 1,2, ... ,k are (n — k) k x 1 dimensional vectors. We represent the (n — k) k — 8 components 
these vectors by the fe = 3 bit representation of their indices as 



6 While the functions defined here are parametrized by n,k, these quantities are not explicitly denoted here for brevity of notation 
'Optimal codes for n = 5, k = 3 have been proposed in j6], II 11 . We only use this case to demonstrate our construction in the simplest 
non-trivial setting. 
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A 4 , 2 a 2 ((0,0,0)) + A 4 , 3 a 3 ((0,0,0)) 


A 4>1 ai((0,0, 1)) 
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Fig. 3. The two parity nodes in the (5, 3) code and repair strategy for failure of node 1. Shaded portions indicate downloaded portions used to 
recover failure of node 1. Note that the undesired symbols can be cancelled by downloading half the components of a 2 , a 3 , i.e., by downloading 

12 ({0, xi, X2)) and a 3 ((0, x\, x 2 )) for xi, x 2 G {0, 1}. 



(«i(l) oj(2) 



( a 3 -((0,0,0)) \ 
aj ((0,0,l)) 
a,((0,l,0)) 
aj ((0,l,l)) 
aj ((l,0,0)) 

^■((1,0,1)) 

aj ((l,l,0)) 

, k. Now, similarly, we can denote the identity matrix as 

e((0,0,0» 
e((0,0,l)) 



for all j = 1,2,. 



e((l,M)) 

where, naturally, e(i) is the zth row of the identity matrix. Now, we describe our code as follows. Since the first three 
storage nodes are systematic nodes and the remaining two are parity nodes, the design parameters are C4 j9 , Cs.,-, V , 
for j = 1, 2, 3. We choose 

C 4j 



[ e(l) " 




e(2) 




. e (8) . 





A 4j I 



so that 



d 4 = X ^ a 



J a 3 ! 



where A 4j are independent random scalars chosen using a uniform distribution over the field ¥ q . Now, consider 



the 8x8 permutation matrix defined as 



Pi 



e((l,0,0)) 




e((0,l,0)) 




/ /n n 1 \ A 

e((0,0,l)) 


e((l,0,l)) 




e((0,l,l» 




e«0,0,0» 


e((l,l,0)) 




e((0,0,0)) 




e«0,l,l» 


e((l,l,l» 


,P 2 = 


e«0,0,l» 


,P 3 = 


e«0,l,0» 


e((0,0,0)) 




e((l,l,0» 


e«l,0,l» 


e((0,0,l» 




e«l,l,l» 




e«l,0,0)) 


e((0,l,0)) 




e((l,0,0}) 




e«l,l,l» 


e((0,l,l)) 




e((l,0,l)) 




e«l,l,0)) 



Then, the fifth node (i.e., the second parity node) is designed as 

3 

d 5 = ^ -V-'.,l\;v 



where X^ j are random independent scalars drawn uniformly over the entries of the field ¥ q . In other words, we 
have 



J 5,j 



1,2,3. 



Pia 



The code is depicted in Figure [3] For a better understanding of the structure of the permutations, consider an 
arbitrary column vector a = [a(l) a(2) . . . a(8)] T . Then, 

/ o((l,0,0)) \ / o(5) \ 

o((l,0,l)) a(6) 

o((l,l,0)) o(7) 

o((l,l,l)) o(8) 

a((0,0,0)) o(l) 

o((0,0,l» a(2) 

a((0, 1,0)) a(3) 

V 4(0,1,1)); v o(4) j 

In other words, Pi is a permutation of the components of a such that the element a((l, £2, £3)) i s swapped with 
the element a((0, x%, X3)) for X2, £3 £ {0, 1}. Similarly, P2 swaps a((x\, 0, X3)) with a((xi, 1, X3)) and P3 swaps 
a((xi,x 2 ,0)) with a((aci,a;2, 1)) where Xi, X2,x 3 G {0, 1}. 

Now, we show that this code can be used to achieve optimal recovery, in terms of repair bandwidth, for a single 
failed systematic node. To see this, consider the case where node 1 fails. Note that for optimal repair, the new 
node has to download a fraction of = | of every surviving node, i.e., nodes 2,3,4,5. The repair strategy is 
to download d,((0, 0, 0)), dj((0, 0, 1)™), d 4 ((0, 1, 0)), d,((0, 1, 1)) from node i e {2,3,4,5}, so that 



V x = 



In other words, the rows of Vi come from the set {e((0, x%, £3}) : a; 2 ,a;3 S {0,1}}. Note that the strategy 
downloads half the data stored in every surviving node as required. With these download vectors, it can be observed 
(See Figure |3]l that the interference is aligned as required and all the 8 components of the desired signal ai can be 
reconstructed. Specifically we note that 



e«0,0,0» " 




■ e(l) ■ 


e((0,0,l» 




e(2) 


e«0,l,0» 




e(3) 


e((0,l,l)) 




e(4) 



rowspan(ViC 4)i ) = rowspan(ViC 5jl ) = span({e((0, x 2 , x 3 )) : x 2 ,x 3 £ {0,1}}) 



(14) 



for i = 2,3: Put differently, because of the structure of the permutations, the downloaded components can be 
expressed as 

cLt((0, X2, x 3 )) = \4,iai((0,X2,x 3 )) + A4, 2 a 2 ((0, x 2l x 3 }) + A 4i3 a 3 ((0, x 2 , x 3 )) 
d 5 ((0,x 2 ,x 3 )) = A 5 ,iai((l,x 2 ,a;3}) + A 5i2 a 2 ((0, x 2 © 1,0:3)) + A 5 , 3 a 3 ((0, x 2 , x 3 © 1}) 



Note that since x%,X3 6 {0,1} there are a total 8 components described in the two equations above, such that, 
all the interference is of the form cij((0, %)2, J/a)); i £ {2, 3}, j/2, 2/3 G {0, 1}. In other words, the interference from 
&i, i — 2, 3 comes from only half its components, and the interference is aligned as described in (fl4l i. However, note 
that the 8 components span all the 8 components of the desired signal ai . Thus, the interference can be completely 
cancelled and the desired signal can be completely reconstructed. 

Similarly, in case of failure of node 2, the set of rows of the repair matrices V2 is equal to the set {e((a;i, 0, £3}) : 
X!,x 3 G {0, 1}}, i.e., 

e(l) ■ 
e(2) 
e(5) 
e(6) _ 

With this set of download vectors, it can be noted that, for i = 1, 3 

rowspan(V 2 C 4 ^) = rowspan(V 2 C 5il ) = span({e((xi, 0, x 3 )) :xi,x 3 £ {0,1}}) (15) 

so that the interference is aligned. It can be verified that the desired signal can be reconstructed completely because 
of condition (TT~3T > as well. The rows of V3 come from the set {e((xi, x%, 0)) : xi, X2 £ {0, 1}}. Equations ( fTOb and 
( fT~3l > can be verified to be satisfied for this choice of V3 with the alignment condition taking the form this case can 
be verified to be satisfied, for i — 1,2, as 

rowspan(V 3 C 4) i) = rowspan(V 3 C 5 ,i) = span({e((zi, x 2 , 0)) :ii,i 2 e {0,1}}) (16) 

While this shows that optimal repair is achieved, all the remains to be shown is that the code is an MDS code, 
i.e., Property Q] This is shown in Appendix |A] for the generalization of this code to arbitrary values of (n, k). Next, 
we describe this generalization. 

B. The optimal [n, k) code 

This is a natural generalization of the (5, 3) code for general values of (n, k), with C = (n — k) k . To describe this 
generalization, we define function Xi{ m ) = {4>i ( m ), H( m ), ,•••■> <f > i-i(m),4>i( m )®h ( l ) i+i( m )> 4>i+2(m), <p k (m)), 
where the operator represents an addition modulo (n — k). In other words, Xi( m ) essentially modifies the zth 
position in the base (n — k) representation of to — 1, by addition of 1 modulo (n — k). 

Remark 1: For the optimal (5,3) code described previously, note that the TOth row of P; is e((xi(m))). In 
other words, for the (5,3) code described above, the mth component of P^a is equal to a((xi{m))). 

Remark 2: (Xj(1)), (Xi( 2 )), ■ • ■ , (Xi((n- k) k )) is a permutation of 1, 2, ... , (n-k) k for any i € {1,2,..., k}. 
Therefore, given a £ x 1 vector a, 

[a((x,(l))),a((x l (2))),...,a((x l ((n-fc) fe )))] T 

is a permutation of a. We will use this permutation to construct our codes. 

In this code, we have C = A4/k = (n — k) k , so that the k sources, a!,a 2 , 
and the coding sub-matrices are (n — k) x (n — k) k matrices. 

Consider the permutation matrix P; defined as 

/ e((xi(l)>) \ 
e((Xi(2)>) 

V e((xi((n k) k ))) J 

for i = 1,2, ... ,k, where e(l), e(2), . . . , e((n — k) k ) are the rows of the identity matrix I^ n _ k ^k. Note that because 
of Remark [2] the above matrix is indeed a permutation matrix. Then, the coding sub-matrices are defined as 

n — \ r>:<- fe -i 

— 'V... ' , 

Thus, to understand the structure of the above permutation, consider an arbitrary column vector 

a= (o(l) o(2) ...a{{n~k) k )) T . 



e((0,0,0» 
e((0,0,l)) 
e((l,0,0» 
e((l,0,l» 



. . . ,& k are all (n — k) k x 1 vectors 



(17) 



Then, let j = (x\,x 2 ,x 3 , ■ • ■ , Xk)) f° r 1 — 3 ' ^ ( n ~ ^)' c - Then, the jth component of P,a is 

a(((xi , x 2 , . . . , a;»-i, a;. © 1, x i+ i, . . . , a;*)))- 

Thus, we can write 

dk+r+i((xi,x 2 ,.--,x k )) = Afc+ r +i,iOi(a; 1 ® r,x 2 ,x 3 , . . . ,x k ) + Afc +r+ i i2 a 2 (a;i, x 2 ®r,x 3 ,...,x k ) 

+ ■■■ + Xk+r+i,kak(xi,x 2 ,x 3 , ...,x k ®r) 

where r £ {0, 1, 2, . . . , n — k — 1}. This describes the coding sub-matrices. 

Now, in case of failure of node /, the rows of the repair matrices V; are chosen from the set {e(m) : <j>i(m) = 0}. 
Since 4>i(m) can take n — k values, this construction has £ = (n — fc) fe_1 rows for V; as required. Because of the 
construction, we have the following interference alignment relation for £ {k + 1, k + 2, . . . , n} 

rowspan(Cj.;V;) — rowspan({e(m) : 4>i(m) = 0}). 

Further, 

rowspan(Cj ;V;) = rowspan({e(m) : (j>i(m) = j — k — 1}). 

for j £ {fc+1, fc + 2, . . . , n} so that ( TT~3T > is satisfied and the desired signal can be reconstructed from the interference. 
All that remains to be shown is the MDS property. This is shown in Appendix lAl 

VI. Explicit Construction of Codes for n - k £ {2, 3} 

While, theoretically, any (n, k) MDS code could be used to build distributed storage systems, in practice, the case 
of having a small number of parity nodes, i.e. small values of n — k, is especially of interest. In fact, a significant 
portion of literature on use of codes for storage systems is devoted to building codes for the cases of (n— k) £ {2, 3} 
with desirable properties (See, for example, Il23l - ll26l ). While these references focused on constructing MDS codes 
with efficient encoding and decoding properties, here, we study the construction of MDS codes for n — k £ {2, 3} 
with desirable repair properties. 

In the previous section, we provided random code constructions based on permutation matrices. In this section, we 
further strengthen our constructions by providing explicit code constructions for the important case of n—k £ {2, 3}. 
Note that the codes constructed earlier were random constructions because of the fact that scalars Aj.i were picked 
randomly from the field. Further, note that, as long as Xj.i,j = k + 1, k + 2, . . . , n, i — 1,2, ... ,k are any set 
of non-zero scalars, the repair bandwidth for failure of a single systematic node is units as required. The 
randomness of the scalars A^ was used in the previous section to show the existence of codes which satisfy the 
MDS property. In this section, for the two cases of n — k = 2 and n — k = 3, we choose these scalars explicitly 
(i.e., not randomly) so that the MDS property is satisfied. For both cases, the scalars A.y are chosen as 

A,,, = A; 1 (18) 

so that we have 

for j = {1, 2, . . . , n - k}, i = 1, 2 . . . , k. 

If n — k = 2, we choose q > (2k + 1) and choose non-zero scalars Ai, A 2 , . . . , \k from the field so that 

Ai ^ Xj, Xi + Xj ^0, for i ^ j. 

Note that in a field of size (2k + 1) or bigger, scalars satisfying the above can be chosen by ensuring that X i + A, = 
=> A, ; ^ {Ai, A 2 , . . . , Afe}. . With this choice of scalars, in Appendix IBl we show that the code satisfies the MDS 
property. 

For n — k — 3, we choose Ai, A 2 , . . . , Afe to be k non-zero elements in the field ¥ q , where q > 2k + 1 is a prime, 
so that 

A, •• A,. A, • A., , (I (19) 
for all i ^ £ {1, 2, . . . , fc}. Note that elements A,; satisfying the above conditions can be chosen satisfying 



the above properties if q > 2k + 1. In Appendix iBl we also show that the code described here for the case of 
n — k = 3 is an MDS code. 

VII. A Subspace Interference Alignment Framework for Optimal Repair 

In this section, we return to Problem 2 described in Section [Hi] Before we consider this problem, we summarize 
some properties of tensor (Kronecker) products below; the notation <g> is used to denote the tensor (Kronecker) 
product between two matrices. 

• Mixed Product Property: 

(P x <g> P 2 • • • ® P m )(Qi ® Q 2 ■ • ■ ® Q m ) = (PiQi) ® (P2Q2) • • • ® (P m Q m ) 

• Invariance w.r.t span: 

If all the factors of a tensor product align with the corresponding factors of another tensor product, then 
the corresponding products also align, and vice-versa. Formally, let Pj, Qj, i = 1, 2, . . . , m be matrices such 
that the dimension of Pj is equal to the dimension of Qj. Then, rowspan(Pj) = rowspan(Qj) ^ {0},i = 
1,2, ... ,m, if and only if 

rowspan(P! ® P 2 . . . ® P m ) = rowspan(Qx (g> Q 2 . . . ® Q m ), 

where represents the row vector whose entries are all equal to 0. 

• Inheritance of linear independence: 

If the rows of one of the factors of a tensor product is linearly independent of the rows of the corresponding 
factor in another tensor product, then the rows of the corresponding products are also linearly independent. 
More formally, let Pj, Qj, i — 1, 2, . . . , m be matrices such that the dimension of Pj is equal to the dimension 
of Qj. Now, suppose that rowspan(Pf) n rowspan(Q;) = {0} for some I 6 {1,2,..., m}, i.e., each row of 
P; is linearly independent of all the rows of Q; for some I £ {1,2, ... , m}. Then, 

rowspan(Pi <g> P 2 <8> . . . <g> P m ) PI rowspan(Qi ® Q 2 ® • • • <8) Q m ) = {0} 

The second and third properties above follow as a result of bilinearity and associativity of tensor products. The above 
properties were used in [18] to develop a type of interference alignment called subspace interference alignment in 
the context of cellular networks. In subspace interference alignment, the property of the invariance of tensor products 
w.r.t. span plays a central role in ensuring that interference aligns, and the inheritance of linear independence property 
plays a central role in ensuring that desired signals are linearly independent of the interference. This intuition recurs 
in our application of the concept here. We apply this idea of subspace interference alignment in the context of the 
repair problem - specifically, we use the idea of subspace interference alignment in the context of Problem 2 in 
Section [ill] We use N = 3 here to demonstrate the main idea - the framework developed here can be used to solve 
the problem for any N £ N. For the convenience of the reader, equations (|6)-(|8) associated with the problem are 
restated (albeit in a slightly different, but equivalent, form) here. 

rowspan(VjHj) = rowspan(Vj), j e {1, 2, 3} - {i} (20) 
rowspan(Vj) n rowspan(VjH,;) = {0} (21) 
rank(Vj) =rank(Hj)/2 = L/2 (22) 

where is the 1 x L row vector of zeros. To recollect, as shown in Section IIV-BI a solution to the above problem 
can lead to an n = N + 2, k = N code by choosing coding sub-matrices Ck+i i — II and Cfc +2 j = H; for 
i= 1,2,..., k. 

A. Simplifying the above problem 

In the remainder of this section, we will use the properties of the tensor products listed to simplify the above 
problem to the following: Find Uo, Go, Gi such that 



rowspan(U G ) = rowspan(U ), 
rowspan(Uo) n rowspan(UoGi) = {0} 



(23) 
(24) 



where Uo is a 1 x 2 row vector, Go,Gi are 2 x 2 matrices and is a 1 x 2 vector of zeros. In other words, 
the problem finding N matrices Vj, Hj, i = 1, 2, . . . , N satisfying all the relations represented in (I20t.(l2ll can be 
simplified into finding three matrices Uo, Go, Gi satisfying d23ll.(l2"4l>. Note that finding Uo, Go, Gi satisfying the 
above is straightforward, and the eigen-vector approach used in Problem 1 in Section [Til] works, i.e., we can pick 
the matrices so that Gq and do not have a common eigen vector and pick Uj to be an eigen vector of Gj. 
As we show next, we use tensor products to "stitch together" N independent instances of the simpler problem of 
satisfying (l23T>.<f24b . to find matrices satisfying (f20li-(l22b. 

In our solution to (t20t-(f22t. we have L = 2 N — 8. Suppose we restrict the matrices (I20t-(l22t to have the 
following structure. 

H i = Ai(Gi ) i<8G i| 2®G ii3 ) 

where Gjj is a 2 x 2 full rank matrix for j = 1, 2, 3, i = 1, 2, 3 and Aj is some non-zero scalar over the field ¥ q . 
Note that Hj has a dimension of 8 x 8 and a rank of 8 as required, with the full rank property coming from the 
fact that Gij,j = 1, 2, 3 each has a rank of 2. We also choose 

Vj = U M ®U ii2 ® U M 

where Ui ; i, U 2j2 , ^3,3 are 1x2 row vectors. Uy for i ^ j are 2 x 2 matrices having a full rank of 2. Note 
that, with this choice of dimensions, Vj have a dimension of 4 x 8 as required. Now, we intend to choose matrices 
Gj j,Uij,i,j € {1,2,3} to satisfy (|2"0t and (fJTJ. We choose these matrices to satisfy 

rowspan(U M G jV i) = rowspan(U M ), i ^ j (25) 
rowspan(U i i G ii j) n rowspan(U^j) = {0} (26) 

for i,j = 1,2, 3. In other words, the 1x2 row vector U^j is invariant w.r.t Gj j, j ^ i but is linearly independent 
of Uj^Gj,,. The 2x2 matrices U,j, i ^ j can be chosen to be arbitrary full rank matrices. Equation (125b ensures 
that ( f20b is satisfied by using the invariance of tensor prodcts w.r.t. span, i.e., by ensuring that each of the N = 3 
factors on the left hand side of (|20T > align with the space spanned by the corresponding factor on the right hand 
side. To see this, note the following. 

rowspan(V!H 2 ) = rowspan ((U^i (g> Ui j2 <8> Ui i 3)(G 2j i ® G 2j 2 ® ^2,3)) 

(a) 



(c) 



rowspan(Ui i iG 2: i ® U ii2 G 2i2 <8> Ui ;3 G 2j3 ) 

rowspan(Ui,iG 2 ,i ® Ui )2 ® Ui i3 G 2i3 ) 

rowspan(U 11 <E> U 12 (g) Ui^) 
rowspan(Vi) 



(a) follows from the Mixed Product Property of tensor products. (6) follows from d25l > and the invariance of the 
tensor product w.r.t. span. Similarly (c) follows from invariance of tensor products w.r.t span, and the fact that 
rowspan(Ui )J ) = rowspan(Ujj-G m ,„) for i ^ j, which in turn, follows from the fact that Ujj and G m ,„ are full 
rank matrices for i ^ j. Thus, d25l l ensures that (l20l i is satisfied for all i = 1, 2, . . . , N. 

Similarly, we show below that as long as (|26| | holds, equation d2TT i is satisfied because of the inheritance of linear 
independence property of tensor products. 

rowspan(ViHi) = rowspan ((Ui,i (g» Ui j2 (g) Ui, 3 )(Gi,i ® Gi, 2 (X) Gi i3 )) 

= rowspan(Ui ) iGi ) i <g> Ui j2 Gi j2 <S> Ui j3 Gi i3 ) 

= rowspan(Ui ) iGi ) i <8> Ui ;2 ® Ui j3 ) 

=> rowspan(ViHi) n rowspan(Vi) = rowspan(Ui,iGi,i ® Ui :2 ig) Ui^) (~l rowspan(Ui i i ® Ui j2 ® Ui^) 

= {0} 



where the final equation follows from (|26> and the inheritance of linear independence into tensor products. Now, 
we have reduced the task of finding matrices satisfying (l2Qj- (T22l to finding matrices satisfying (T25l>-(|2"6"1>. Suppose 
we set 

Ui,i = U 2:2 = U 3; 3 = Uo 



Gj.,1 — G2 : 2 — G3 : 3 — Gi 

Ujj = Ui, Gij = G , i ^ j 

Now equations d25b.(l26]l essentially boil down to finding matrices 

rowspan(U Go) = rowspan(U ) (27) 

rowspan(U Gi) D rowspan(U ) = {0} (28) 

Ui can be any full rank 2x2 matrix. Thus, the simplification of Problem 2 of Section |UH is complete (at least, for 
N = 3). As discussed before, the eigen vector approach illustrated for Problem 1 in Section Hill suffices to finding 
Uo,Go,Gi satisfying the above relations. In fact, to obtain the (5,3) permutations-based coding sub-matrices 
described previously, we choose 

U 
G 1 = 

Ui = Go = I 3) i,je {1,2, 3}, i + j 

It can be noticed that the matrices Uo, Gi,Ui satisfy (T25ll-(l2o*]>. Further, in general, any choice of matrices which 
satisfy (I20li- (l22l . and hence (f27l>.(f28b would solve problem 2, and hence, can be used for codes with optimal repair 
bandwidth for distributed storage, for n — k = 2. For example, we could alternately the matrices inspired by ergodic 
alignment [17|. These matrices are shown below. 

U = (l -1) 

-0 

G = I 2 ,»,j e {l,2,3},i^j 

and Ui to be any arbitrary full rank 2x2 matrix. In fact, this choice of matrices has been studied for efficiently 
repairable code constructions in ll27l . 

B. Discussion 

• For N — 3, we used L — 2 3 and expressed H; as a Kronecker product of N = 3 matrices. For an arbitrary 
N, we can extend the above framework by expressing H; as a Kronecker product of N 2 x 2 matrices so that 
L = 2 N . Vj is also, similarly, a Kronecker product of N matrices, such that the ith matrix is a 1 x 2 matrix, 
and the remaining N — 1 matrices participating in the Kronecker product, are 2 x 2 matrices. 

> Because of Section lTV-Bl the subspace interference alignment framework here can be used to generate (fc+2, fc) 
codes which can be repaired by downloading 1/2 the data stored in every surviving node. This is because 
equation ( f20b ensures that the interference is aligned, and OTT i ensures that the lost (desired) symbols can be 
reconstructed from the downloaded data. However, this framework does not ensure Property [T] i.e., it does 
not ensure that the code generated is MDS. The MDS property can be ensured by choosing the scalars Xi 
randomly over the field and using the Schwartz-Zippel Lemma along the same lines as the proof in Appendix 
lAl In other words, the Schwartz-Zippel Lemma ensures that there exist at least one choice of scalars A; so 
that the code is an MDS code. 

• The problems motivated in Section[lIj]and solved in this section are related to optimal repair of failed systematic 
nodes in a distributed storage system with n — k = 2 parity nodes. In general, if n — k > 2, the framework 
developed here can be used to show that the problem of finding repair-bandwidth optimal MDS codes can 
be decomposed into the problem of finding full rank (n — k) x (n — k) matrices Go, Gi, . . . , G n -k-i and 



= (1 0) 




1 x (n — fc) dimensional row vector Uq such that 



rowspan(UoGo) = rowspan(U ) 



rank 



U 



(29) 



(30) 



\ [ UoG„_A;_l J J 

With a solution to the above problem, the coding sub-matrices can be chosen as C^+i j = I and for m > 1, 

C k+m j = G ® • • . ® G (8>G m _i (8 G (8) ... ® G . 



(j'-i) times 
The repair matrices V_, can be obtained as 



V, = Ux 



Ui ®U <8 Ui 



fe-j times 



-l times 



fe-i times 



where Ui is a full rank (n — fc) x (n — fc) matrix. The permutation matrices used for code development in 
Section IVl here can be interpreted as one solution to equations (f29b-(l3Qb. 
• It is worth noting that the framework developed in this section can be used to generate that codes are optimal 
from the perspective of the repair bandwidth. However, the codes developed need not be optimal from the 
perspective of the amount of disk access in the storage system. The codes of Section [V] which fit within this 
framework, satisfy the additional property of being optimal from the perspective of the amount of disk access. 

VIII. Conclusion 

In this paper, we construct class of MDS codes based on subspace interference alignment with optimal repair 
bandwidth for a single failed systematic node. A class of our code constructions are optimal, not only in terms of 
repair bandwidth, but also in terms of the amount of disk access during the recovery of a single failed node. Since 
we effectively provide the first set of repair-bandwidth optimal MDS codes for arbitrary (n, fc), this work can be 
viewed as a stepping stone towards implementation of MDS codes in distributed storage systems. 

From the perspective of storage systems, there remain several unanswered questions. First, there remains open 
the existence of finite codes which can achieve more efficient repair of parity nodes as well, along with systematic 
nodes. Second, we assume that the new node connects to all d = n — 1 surviving nodes in the system. An 
interesting question is whether finite code constructions can be found to conduct efficient repair when the new 
node is restricted to connect to a subset of the surviving nodes. While asymptotic constructions satisfying the lower 
bounds have been found for both these problems, the existence of finite codes satisfying these properties remain 
open. Finally, the search for repair strategies of existing codes, which is analogous to the search of interference 
alignment beamforming vectors for fixed channel matrices in the context of interference channels, remains open. 
While iterative techniques exist for the wireless context 11281 . [29|, they cannot be directly extended to the storage 
context because of the discrete nature of the optimization problem in the latter context. Such algorithms, while 
explored in the context of certain classes of codes in 11121 . |[T3"1 , remain an interesting area of future work. 

Appendix A 
MDS Property 

We intend to show that the determinant of the matrix in ((9} is a non-zero polynomial in A = = 
k + 1, k + 2, . . . , n, i = 1, 2, . . . , fc} for any ji, j2, ■ ■ ■ ,jk G {1, 2, ... , n}. If we show this, then, each MDS 
constraint corresponds to showing that a polynomial Pj 1 ,j 2 ,...,j k (A) is non-zero. Using the Schwartz-Zippel Lemma 
on the product of these polynomials n'j 1 ,j 2 ,...,j k Pj 1 ,j 2 ,...,j k (A-) automatically implies the existence of A so that the 
MDS constraints are satisfied, in a sufficiently large field. Therefore, all that remains to be shown is that the 
determinant of (0 is a non-zero polynomial in A. We will show this by showing that there exists at least one 
set of values for the variables A such that the determinant of (0 is non-zero. To show this, we first assume, 
without loss of generality, that j%,j2, ■ ■ ■ ,jk are m ascending order. Also, let ji,j2, ■ ■ ■ ,jk-m £ {1, 2, . . . , fc} and 
jk-m+i,3k-m+2, • ■ • , A- £ {fc + 1, fc + 2, . . . , n}. For simplicity we will assume that j 1 = 1, j 2 = 2, . . . , j k - m = 



fc — m. The proof for any other set {ji,j 2 , ■ • ■ >jk-m} is almost identical to this case, except for a difference in 
the indices used henceforth. Substituting the appropriate values of Cji, the matrix in (0 can be written as 



I 





A 



ySk-m+l 








/K 3h-m.+ l,k-m sr k- 



where Si = ji — fc — 1. Now, if 



k-m^ k- 



x 3k- 








(31) 



A 



if (j, i) i {(j t , t) : t = k - m + 1, k - m + 2, . . . , k} 

1 otherwise 



then the above matrix is a block diagonal matrix. Therefore, its determinant evaluates to the product of the 

k 

determinant of its diagonal blocks, i.e., JJ |P^"| which is non-zero. This implies that the determinant in 

u—k—m+l 

(|9j» is a non-zero polynomial in A as required. This completes the proof. 

Appendix B 

Proof of MDS property for explicit constructions of SectionIVII 

We need to show Property [TJ Before we show this property, we begin with the following Lemma which shows 
that the coding submatrices in our constructions commute. 
Lemma 2: 



pmi pm 2 

i 3 



p"i2pmi 

3 i 



where Pj is chosen as in ( fTTl) . 

Proof: In order to show this, we show that PjPja = PjP;a for any 2 fc x 1 dimensional column vector a. 
Assuming without loss of generality that i < j, this can be seen by verifying that 



-P { -P j 3- — P j P i 3- — 



1 entries 



3—* 



-l entries 



a(( 0,0,...,0 ,0©mi, 0,...,0 , © m 2 , 0, . . . , 0}) 
o({l,l,...,l,lemi,l,...,l,lem 2 ,l,...,l» 



a((k - 1, k - 1, . . . ,k - 1, (k - 1) © mi,0, . . . ,0, (k - 1) © m 2 ,fc - 1, . . . , k - 1)) 
In other words, the < r l7 r 2 , . . . , r k >th element of both P^PJ^a and P™ 2 ?™^ can be verified to be 

a{< n,r 2 , ■ • . ,ri_i,ri ®mi,n + i, . . .,rj_i,rj © m 2 ,r j+ i, . . .,r k )) 



Now, we proceed to show the [TJ property for n — k 6 {2, 3}. Without loss of generality, we assume that that 
ji,j 2 , . . . , jk are in ascending order. 

Case 1: n — k = 2: We divide this case into 2 scenarios. In the first scenario , ji,j 2 , ■ ■ . ,jk-i S {1, 2, . . . , fc} 
and jk G {k + 1, k + 2}. Note that this corresponds to reconstructing the data from fc — 1 systematic nodes and a 
single parity node. Now, substituting this in equation OTT l in Appendix [A] and expanding this determinant along 
the first (fc — 1)C columns, we get this determinant to be equal to \Cj h j|. Therefore, the desired property is 
equivalent to the matrix Cj i = (A^Pi)- 7- *^ 1 to be full rank for all j E {fc + 1, fc + 2, . . . , n}, i = 1, 2, . . . , k. This 
scenario is hence, trivial. Now, in the second scenario, consider the case where ji,j 2 , . . . ,jk-2 £ {1> 2, . . . , k} and 
jk-i = k + l,jk = k + 2. This corresponds to the case where the original sources are reconstructed using fc — 2 
systematic nodes, and both parity nodes. By substituting in (l3TT l and expanding along the first (fc — 2)C rows, the 



MDS property can be shown to be equivalent showing that the matrix 



I I 

•^jPj AjPj 

having full rank. Now, note that the matrices P, and Pj. On noting that the determinant of commuting block- 
matrices can be evaluated by using the element-wise determinant expansion over blocks [30], the determinant of 
the matrix can be written as 

|A,P, - A 4 P,| = A7 1 |P i - 1 ||P i Pri - ^7% 

Note that the above expression is equal to if and only if XiXj 1 is an Eigen-value of the permutation matrix 
PjPr 1 . Note here that PjP^ 1 is a permutation matrix whose square is the identity matrix. Therefore, the only 
possible eigen values of this matrix are the square roots of unity, i.e., 1 and — 1. As noted in we have 

A* =/= + Xj ^ A^AJ 1 ^ 1, AiAj 1 7^ —1, and hence, the determinant shown above is non-zero and the 

matrix is full-rank as required. 

Case 2: n — k = 2: We divide this case into 5 scenarios as listed below. 



1) h,h, 

2) 31,32, 

3) ji,32, 

4) ji,32, 

5) ji,32, 



-i€{l,2, 
-2 £{1,2, 
-2 €{1,2, 
-2 6 {1,2, 
Jk-3 G {1,2, 



,3k 
,3k 
,3k 
,3k 



, k} and j k G {k + 
,k} and j k -i = k 
,k} and j k -i = k 
,k) and j k _i = k 
, k] and j fc _ 2 = k 



l,fc + 2,fc 
hl,j k = k 
V2, Jk = k 
V^,3k = k 
V 1, jfc-i = 



f 3}. 
+ 2}. 
+ 3}. 
+ 3}. 

k + 2,j k 



k + 3}. 



On noting that P^Pj 1 is a matrix whose third power (i.e., cube) is the identity matrix (i.e., it is a permutation that 
can be decomposed into cycles of length 3), its eigen values of the cube roots of 1. This means that in a finite field 
whose size is a prime (which is not equal to 3), its only unique eigen value is 1. Note that this means that Property 
[T]can be proved to hold in the first two scenarios using arguments similar to Case 1. For the third scenario, again, 
using arguments similar to Case 1, showing the MDS property is equivalent to showing that the matrix 



AiPi 
A?P? 



has a full rank. The above matrix has a full rank because is equal to 



I 

A,;P,; 



AiPi 




and both the matrices of the above product have full rank. Now, for the fourth scenario, we need to show that the 
matrix 

I I 

A;P'; A^P; 

has a full rank. This can be seen on noting that the determinant of the above matrix evaluates to 



xfpfl = aiip; 



a?a; 



which is non-zero if XfXJ ^1, again because PjP 7 is a matrix whose eigen values are the cube roots of unity. 
The conditions in ( fT9l ) ensures that Xj ^ A|. Finally, we consider to scenario 5 where we need to show that all the 
information can be recovered from k — 3 systematic nodes, and all 3 parity nodes. For this, we need 



I 

XiPi 

A?P? 



I 

Aj p j 
X]P* 



I 

A/P ; 



to have full rank. Note that the above matrix has a block Vandermonde structure, where each of the blocks commute 
pairwise because of Lemma [2] This fact, combined with the fact that commuting block matrices can be expanded 
in a manner, similar to the element-wise determinant expansion, implies that the determinant of the above matrix 
is equal to 



The determinant is non-zero since Aj ^ Xj if i ^ j. This completes the proof of the desired MDS property. 
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